In This Talk

Will walk you through these steps with a dataset

Find these slides @ https://bit.ly/2lyGAqr

My Background

Orignally a worm biologist, now bioinformatician @ Monash Bioinformatics Platform, more recently R-Ladies Melbourne organiser

This talk can be considered ‘Most Useful Things Worm Adele Would Have Liked to Have Known When Starting Out in R’

R & Things I Like About It

Programming language for statistical computing and graphics

Has lots of plotting functionality and well geared towards data analysis out of box with in-built statistical tests

Well developed ecosystem of software packages that further expands base R for analysis, project management, visualisation, document generation, etc

Continous active development

Thorough documentation

R Markdown

The marriage between Markdown, a lightweight markup language and R, a programming language for statistics

An R Markdown file is a plain text document that allows you to embed R code chunks + plain text notes & images & videos.

Structure:

  1. YAML header - The meta-data that describes the final document output
  2. Markdown section - content/body of the document - your text/notes, images, links, etc
  3. Code chunks - where the R* code goes

An R Markdown file by itself is quite simple but is neatly rendered into a more complicated document type

*actually supports up to 52 language engines including Python, Julia, C++, MySQL, bash, etc

YAML header


title: "Rmarkdown Quickstart"
author: "Adele Barugahare"
date: "27/08/2019"
output: 
  ioslides_presentation:
    df_print: "paged"
  html_document:
    df_print: "paged"
    toc: true
    toc_depth: 2
  pdf_document:
    number_sections: true
    df_print: "kable"

Code chunks

` ```{r, chunk_options}` `

#Code analysis goes here
x <- 1:10
y <- x * 2

plot(x, y)
etc

` ``` `

R Markdown Let’s Play

Github: aabarug

Repo: quickstart_rmarkdown_sept_2019

R Data Analysis Toolbox

  • Tidyverse - an opinionated collection of R packages designed for data science: ggplot2, dplyr, magrittr, tidyr, readr, etc
    • ggplot2 - extensive plotting package
  • Shiny - build interactive web applications/dashboards
  • R Markdown - document generation

Tabular data is represented as data-frames - in-built class

Tidyverse

  • ‘Modern’ way of writing R and geared at data science
  • Fixes up some quicky behaviour from base R
  • Improved data-frames - tibbles

Tidy data:

  1. Each variable is in a column.
  2. Each observation is a row.
  3. Each value is a cell.

dplyr

  • is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges
  • each verb takes a data frame as input and returns a modified version of it
  • the idea is that complex operations can be performed by stringing together a series of simpler operations in a pipeline.
input       ++        ++        ++      result
data   %>%  |  verb  |  %>%   |  verb  |  %>%   |  verb  |  ->  data
frame       ++        ++        ++      frame

%>% - Pipe symbol that passes output from one function to another (Magrittr package)

*Dataset* > Manipulate to extract information > Plot > Communicate

Read in data:

## ── Attaching packages ───────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.1     ✔ dplyr   0.8.1
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Parsed with column specification:
## cols(
##   Route = col_character(),
##   Departing_Port = col_character(),
##   Arriving_Port = col_character(),
##   Airline = col_character(),
##   Month = col_double(),
##   Sectors_Scheduled = col_double(),
##   Sectors_Flown = col_double(),
##   Cancellations = col_double(),
##   Departures_On_Time = col_double(),
##   Arrivals_On_Time = col_double(),
##   Departures_Delayed = col_double(),
##   Arrivals_Delayed = col_double(),
##   Year = col_double(),
##   Month_Num = col_double()
## )

The dataset:

Australian domestic airlines on time dataset with information from 2004 to 2019 - 80615 rows and 14 columns from http://data.gov.au/

Dataset > Manipulate to extract information > Plot > Communicate

Extract route ‘Adelaide-Brisbane’ in the year 2008 & fix up the month column

Dataset > Manipulate to extract information > Plot > Communicate

Packages For Interactivity

Plotly

Or use it as a wrapper to a ggplot object with ggplotly

R Markdown Outputs

Supported Documents Outputs

  • webpages
  • R-notebooks
  • PDFs
  • Slideshows
  • Books
  • Websites

…and more created by the R community

Packages that further build on top of R Markdown

  • blogdown - combines R Markdown & Hugo to create general purpose websites
  • bookdown - authoring books, thesis, sfotware manuals, etc
  • flexdashboards - HTML outputs with dashboard layouts
  • xaringan - slides shows with remark.js

You can also build Shiny apps into R Markdown documents

R Markdown & Analysis Reproducibility

Document what you’ve done with your data in code

R Markdown can render multiple different document types from one Rmd file

The more places (files) an analysis is spread across, the more work it is to keep all of it accurate and up-to-date.

R Markdown allows you to focus on generating content & doing your analysis without (hopefully) spending too much time fighting your document itself*

*the more a document is geared towards a particular output type, the harder it is to neatly convert between formats

Neat Examples